Fact distribution in Information Extraction

نویسنده

Mark Stevenson

چکیده

Several recent Information Extraction (IE) systems have been restricted to the identification facts which are described within a single sentence. It is not clear what effect this has on the difficulty of the extraction task or how the performance of systems which consider only single sentences should be compared with those which consider multiple sentences. This paper compares three IE evaluation corpora, from the Message Understanding Conferences, and finds that a significant proportion of the facts mentioned therein are not described within a single sentence. Therefore systems which are evaluated only on facts described within single sentences are being tested against a limited portion of the relevant information in the text and it is difficult to compare their performance with other systems. Further analysis demonstrates that anaphora resolution and world knowledge are required to combine information described across multiple sentences. This result has implications for the development and evaluation of IE systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic Fuzzy Discrimination Information Measure Cost Function in Image Processing

A new cost function based on stochastic fuzzy discrimination information measure is introduced in this paper. Focusing on their significant parts, this cost function is used to find the optimal value of threshold for denoising image. It is, in fact, an extension of fuzzy entropy cost function by the present author. Multivariable normal distribution is used for creating focus on significant part...

متن کامل

Enabling Public Access to Non-Open Access Biomedical Literature via Idea-Expression Dichotomy and Fact Extraction

The general public shows great potential for utilizing scientific research. For example, a singer discovered her ectopic pregnancy by looking up clinical case reports. However, an exorbitant paywall impedes the public’s access to scientific literature. Our case study on a social network demonstrates a growing need for non-open access publications, especially for biomedical literature. The chall...

متن کامل

Mean Drop Size and Drop Size Distribution in a Hanson Mixer-Settler Extraction Column

متن کامل

شناسایی خودکار سبک موسیقی

Nowadays, automatic analysis of music signals has gained a considerable importance due to the growing amount of music data found on the Web. Music genre classification is one of the interesting research areas in music information retrieval systems. In this paper several techniques were implemented and evaluated for music genre classification including feature extraction, feature selection and m...

متن کامل

A Specialised Verb Lexicon as the Basis of Fact Extraction in the Biomedical Domain

The BioLexicon is a standardised, reusable, lexical and conceptual resource suitable for advanced biomedical text mining. One of the unique features of the BioLexicon is the incorporation of rich syntactic and semantic patterns for a wide range of domain-relevant verbs, which have been acquired semiautomatically from biomedical corpora. Such types of information can be highly beneficial for inf...

متن کامل

Stochastic Comparisons of Probability Distribution Functions with Experimental Data in a Liquid-Liquid Extraction Column for Determination of Drop Size Distributions

The droplet size distribution in the column is usually represented as the average volume to surface area, known as the Sauter mean drop diameter. It is a key variable in the extraction column design. A study of the drop size distribution and Sauter-mean drop diameter for a liquid-liquid extraction column has been presented for a range of operating conditions and three different liquid-liquid sy...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Language Resources and Evaluation

دوره 40 شماره

صفحات -

تاریخ انتشار 2006

Fact distribution in Information Extraction

نویسنده

چکیده

منابع مشابه

Stochastic Fuzzy Discrimination Information Measure Cost Function in Image Processing

Enabling Public Access to Non-Open Access Biomedical Literature via Idea-Expression Dichotomy and Fact Extraction

Mean Drop Size and Drop Size Distribution in a Hanson Mixer-Settler Extraction Column

شناسایی خودکار سبک موسیقی

A Specialised Verb Lexicon as the Basis of Fact Extraction in the Biomedical Domain

Stochastic Comparisons of Probability Distribution Functions with Experimental Data in a Liquid-Liquid Extraction Column for Determination of Drop Size Distributions

عنوان ژورنال:

اشتراک گذاری